Sentiment Analysis

Machine Learning
Author

ThanachotL.

Published

June 29, 2022

This project is part of my coursework in the university. Here, I performed sentiment analysis by using a pre-trained model to predict the sentiment that indicate whether the customer being satisfied or not from the service. I will be working with a data set, Amazon Fine Food Reviews, which provide attribute such as review comment, summary, score, profilename etc. With that, I might choose only some atrribute and split them into training set and testing set. In the end, I would show the accuracy rate of my model and some challenge for develop the efficientcy of model in the future.

#import libraries
#the dataset: Amazon Fine Food Reviews
import pandas as pd
import numpy as np
import seaborn as sns
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import datetime as dt
import datetime
import nltk
nltk.download()
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import re
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

Import the data set

#import dataset
df = pd.read_csv('Reviews.csv')
#see the sample of dataset
df.sample(100000)
Id ProductId UserId ProfileName HelpfulnessNumerator HelpfulnessDenominator Score Time Summary Text
440845 440846 B00152VHXG A3EMYFATMQ39M9 Aliby 0 0 5 1343606400 Weren't squashed! The tub is nice because the candies don't get ...
200151 200152 B002N4ID5K A3LF3AIVUYK5UX Susan C Schnur 0 0 5 1316736000 Always the go-to chew in our house I have a very aggressive 5 month old chewer. ...
370957 370958 B000BJPEE2 A3FAL7ERHLQ63Q rayster 2 2 5 1306368000 Clearbrook Farms Wild Maine Blueberry Preserves Clearbrook Farms Wild Maine Blueberry Preserve...
126828 126829 B001EQ5H6Q A3VMWM0R3J70VZ Jonathan B 2 2 5 1332633600 A Must For any Cook I have used all of More Than Gourmet's product...
123789 123790 B000ILA3FS A1ZJF9RBTMQAIY Bob Jetski 0 0 5 1335571200 Hull less popcorn? Amish Country Lady Finger popcorn is not hull ...
... ... ... ... ... ... ... ... ... ... ...
267087 267088 B004BKLHOS AUV3OR951650C N. Porter 1 1 5 1299801600 Wow, what a great flavor!! Having always eaten the Honey Maid Graham Crac...
362048 362049 B006N3HZ6K A1WTA8FDNV5O7L Anan Seven 2 2 2 1301616000 Try a sample first if you can I like strong coffee/espresso and was excited ...
517571 517572 B00954NY46 AOA1AJJZSDS01 C. Wagner "just one more gadget" 1 2 4 1302220800 Does The Job Jolt-wise, this is a nice, gentle introduction...
508258 508259 B006N3IE6A A35R32TA60XD57 M. Torma 2 2 5 1291852800 New favorite! This one will not disappoint! I got this yest...
368580 368581 B005K4Q1W2 A33L7DXL6YUAPU joally 0 0 1 1321660800 Horrible "fake" taste I recommend Green Mountain Hot Apple Cider ins...

100000 rows × 10 columns

Reshape and Explore Data

Initially, we need to explore the landscape of our data first and make a decision to selecte only essential attributes. Also, we can perform visualization in order to understand our data more.

#get the column name using list comprehension
print([col for col in df])  
['Id', 'ProductId', 'UserId', 'ProfileName', 'HelpfulnessNumerator', 'HelpfulnessDenominator', 'Score', 'Time', 'Summary', 'Text']
#dataset is too large and it requires more resources to utilize all dataset
#dislaimer: in this project, I will used only one-fifth of the dataset
df.shape 
(568454, 10)
df0 = df.sample(frac = 0.20) # taking 20% of dataset
df0 = df0[['Id','ProfileName','Score', 'Time', 'Summary', 'Text']] # query only some attribute
df0.head()
Id ProfileName Score Time Summary Text
210335 210336 Lisa M. Langrehr "Phillygirl" 1 1281052800 not sure if it was from this food based on reviews and ingredients ordered this ...
47203 47204 Alison Trotta 5 1316131200 Love these! My cat loves these treats. Whenever she sees t...
160712 160713 manaalaq 4 1341014400 Best dog food we have fund Wellness is by far the best dogfood we have ev...
276786 276787 Ronda the Shopper 5 1259020800 Yum yum lover These noodles are great! The only thing missi...
290333 290334 fake name 2 1308182400 Not original flavor as advertised I recently visited a Cracker Barrel and bought...
# create an id of dataset (more organized)
id = np.arange(0,df0.shape[0]) 
id.shape
(113691,)
df0['id'] = id # insert new_id that has been created
df0.set_index("id", inplace = True) #setting as index_column
df0.pop('Id') # taking out the old one
df0
ProfileName Score Time Summary Text
id
0 Lisa M. Langrehr "Phillygirl" 1 1281052800 not sure if it was from this food based on reviews and ingredients ordered this ...
1 Alison Trotta 5 1316131200 Love these! My cat loves these treats. Whenever she sees t...
2 manaalaq 4 1341014400 Best dog food we have fund Wellness is by far the best dogfood we have ev...
3 Ronda the Shopper 5 1259020800 Yum yum lover These noodles are great! The only thing missi...
4 fake name 2 1308182400 Not original flavor as advertised I recently visited a Cracker Barrel and bought...
... ... ... ... ... ...
113686 lazy cook 4 1246665600 mmmmmmm, tasty not as good as trader joe's.<br /><br />heatin...
113687 C 4 1301529600 Old favourite in a S'mores-ready size These are a winner. As soon as I received the...
113688 David Jones "fly fisher" 4 1296691200 Good and Crunchy This new cereal tastes very good, although the...
113689 Katharine A. Mitchell 2 1271635200 Not worth the money This was not worththe $100.00. The toy broke w...
113690 Blowfishn "Clover'D'Alien" 1 1205452800 These Are Yuck! I never tasted bagel snacks that are so awful ...

113691 rows × 5 columns

#rearrange the position of atributes (to be more organized)
df1 = df0[['Time', 'ProfileName', 'Summary', 'Text', 'Score']] 
df1.head(20) 
Time ProfileName Summary Text Score
id
0 1281052800 Lisa M. Langrehr "Phillygirl" not sure if it was from this food based on reviews and ingredients ordered this ... 1
1 1316131200 Alison Trotta Love these! My cat loves these treats. Whenever she sees t... 5
2 1341014400 manaalaq Best dog food we have fund Wellness is by far the best dogfood we have ev... 4
3 1259020800 Ronda the Shopper Yum yum lover These noodles are great! The only thing missi... 5
4 1308182400 fake name Not original flavor as advertised I recently visited a Cracker Barrel and bought... 2
5 1341792000 Katja yummi for the cats We got 2 cats. 1 Siami and 1 Main Coon. Food i... 5
6 1300838400 AmazonMySavior My son's 2 and this is still his preferred bre... I mix cereal, cheese & hot milk, and my son go... 5
7 1270080000 B. Karnofsky The best Until trying Jet Fuel, I thought Van Houtten E... 5
8 1211241600 S. Slagle "I-just-want-it-to-work!" Yummy and not too sweet This is great cereal, with just a hint of swee... 5
9 1273881600 D. D. Lett "Daniel J Lett" Love Butternut Coffee I so Miss being able to get Butternut Coffee I... 5
10 1346544000 Moto Splendid Spicy Stuff! I've bought this before, but have had trouble ... 5
11 1235606400 Jerry P. Danzig Bait and Switch The idea of a carbonated 100% fruit juice seem... 3
12 1347926400 Everett Starkweather "Ev" Just what a diabetic needs -- more sugar! This cereal used to have nine grams of carbohy... 1
13 1336435200 sassafrass66 Doesn't get much better I was pet sitting recently and the dog I was s... 5
14 1341792000 tv2557 Yum This is an amazing cookie! I thought it was go... 5
15 1310342400 bryan lee It's going to expire! It says that this item should sell by july 11,... 1
16 1327276800 Pen Name expensive cliff crunch bars These are not original cliff bars, but the sma... 1
17 1341532800 RealLoveIs Good Snack! Not the healthiest snack of them all BUT certa... 5
18 1265846400 A reader in Pennsylvania Not for espresso I own fairly decent espresso equipment, an Ast... 1
19 1282262400 Dee Lightful Great Bar at Great Price Love these bars - satisfies my chocolate cravi... 5
df1.shape
(113691, 5)
#Explore the dataset
#Goal: visualize the proportion of reviews catagrorized by score
#showing proportion of each score rate in percentage
score_prop = df1.groupby('Score')['Text'].count()/len(df1.Score)*100
round(score_prop)
Score
1     9.0
2     5.0
3     8.0
4    14.0
5    64.0
Name: Text, dtype: float64
#Visualize proportion of score with pie chart
# declaring data
x = score_prop.to_list()
data = x
keys = ['Score 1', 'Score 2', 'Score 3', 'Score 4', 'Score 5']
  
# define Seaborn color palette to use
palette_color = sns.color_palette('RdBu')
  
# plotting data on chart
plt.pie(data, labels=keys, colors=palette_color, autopct='%.0f%%')
  
# displaying chart
plt.show()

#NOTE: The marjority of the plot is dominated by the reviews with Score 5, and this could lead to imbalance of data prediction.

# Explore the data
# displaying the full text of reviews
with pd.option_context('display.max_colwidth', None):
  display(df1)
Time ProfileName Summary Text Score
id
0 1281052800 Lisa M. Langrehr "Phillygirl" not sure if it was from this food based on reviews and ingredients ordered this food to change out with wellness food, my 9 yr old cat developed stones in his bladder. he underwent surgery this past Tuesday. It will be a long touph recovery for him. he never had this problem until I mixed this food into his diet. it could be a coincidence but I think it is strange this happened about 1 month after starting this food. His stones have been sent to the lab to see what caused them from his diet. Anyhow the food does look good, just might not have worked for my little guy. 1
1 1316131200 Alison Trotta Love these! My cat loves these treats. Whenever she sees the bag, or hears it crinkles, she starts to meow. She has a hairball problem, and these treats help with that. 5
2 1341014400 manaalaq Best dog food we have fund Wellness is by far the best dogfood we have ever fed our yellow lab. Though it is expensive, we figure we spend the same amount or less than we would through purchasing a cheaper food as we would need to feed more for the same caloric intake. We get by with a thirty pound bag about once a month or so. Our lab's coat is healthy, he has high energy, and his "movements" are solid... May be too much information. However, that is one of the big differences I notice with cheaper dog foods. To help with further comparison, our lab is 8 years old, is an active hunter nine months out of the year (waterfowl spring and fall, rabbits, upland birds), and he is an indoor family dog. 4
3 1259020800 Ronda the Shopper Yum yum lover These noodles are great! The only thing missing was the wonderful atmosphere of the country! 5
4 1308182400 fake name Not original flavor as advertised I recently visited a Cracker Barrel and bought a box or Dubble Bubble Original Flavor bubble gum. I remember chewing this gum as a kid, it was my favorite gum because of its great flavor. After I had a taste for the gum again, I wanted more and I am 100 miles away from the nearest Cracker Barrel, so I checked on Amazon, my go to place. I was excited to see they had it and two separate listings. One for "original" and a regular flavor. The tub that I received is not the original flavor bubble gum, even though it specifically states that it is on the product listing. This is the plain, old, gross, generic flavor they changed to a couple of years ago.<br /><br />If you are looking for the cinnamon/clove flavor you remember growing up, this is NOT the gum you are looking for, you will be disappointed. 2
... ... ... ... ... ...
113686 1246665600 lazy cook mmmmmmm, tasty not as good as trader joe's.<br /><br />heating it by boiling does a more thorough job than the microwave. goes great over rice and/or with flat bread.<br /><br />just remember to brush your teeth if you have a date after, lest ye have green spinach mixed with your pearly white smile. 4
113687 1301529600 C Old favourite in a S'mores-ready size These are a winner. As soon as I received these I called some friends and arranged a S'mores making party around their firepit.<br /><br />Being the same classic graham cracker that the kids knew and loved, there were no complaints on the flavour, no complaints on oats being on the cracker or on the cracker being "different". Additionally, since the crackers are perfectly S'mores sized right out of the packet, there was less fuss about one kid having a more perfectly broken cracker than another, and fewer frazzled parents trying to juggle sticky burnt marshmallows while aiming for the perfect break along the serrated line for fussy kids.<br /><br />Another plus is that these are packaged in stacks of 8, which is both a perfect size for snacking and for toting around.<br /><br />This is frazzled parent and fussy kid approved. I will definitely be buying more for the summer camping trips. 4
113688 1296691200 David Jones "fly fisher" Good and Crunchy This new cereal tastes very good, although they are a bit on the sweet side for me. They don't have much of a peanut taste, which is fine with me. There strongest point is that they start crunchy and stay that way in milk. I really hate mushy cereal and these last a long time in milk. They have a honey taste that is very pleasing. I find them a good snack dry also.<br /><br />For adults, like me, they may be a bit too sweet, but for kids they are probably about right. I'm not a nutritionist, but I suspect that they are more nutritional than frosted flakes and taste much better. Give them a try, especially if your into sweet cereals. 4
113689 1271635200 Katharine A. Mitchell Not worth the money This was not worththe $100.00. The toy broke within one hour. The rest of the items were trading cards. 2
113690 1205452800 Blowfishn "Clover'D'Alien" These Are Yuck! I never tasted bagel snacks that are so awful tasting as these. Too bad I bought 2 of them. Poppy Seed flavor? I thought they have a fishy flavor to me. and like they say, they are twice baked.... just under burnt they are. YUCK.... at least my pigs do like them! They are a snack to them!! 1

113691 rows × 5 columns

#convert (int) timestamp to datetime
df1['Time'] = df1['Time'].apply(lambda x : datetime.datetime.fromtimestamp(x)) 
df1.head()
Time ProfileName Summary Text Score
id
0 2010-08-06 08:00:00 Lisa M. Langrehr "Phillygirl" not sure if it was from this food based on reviews and ingredients ordered this ... 1
1 2011-09-16 08:00:00 Alison Trotta Love these! My cat loves these treats. Whenever she sees t... 5
2 2012-06-30 08:00:00 manaalaq Best dog food we have fund Wellness is by far the best dogfood we have ev... 4
3 2009-11-24 08:00:00 Ronda the Shopper Yum yum lover These noodles are great! The only thing missi... 5
4 2011-06-16 08:00:00 fake name Not original flavor as advertised I recently visited a Cracker Barrel and bought... 2
# see the summary of a dataset
df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 113691 entries, 0 to 113690
Data columns (total 5 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   Time         113691 non-null  datetime64[ns]
 1   ProfileName  113687 non-null  object        
 2   Summary      113687 non-null  object        
 3   Text         113691 non-null  object        
 4   Score        113691 non-null  int64         
dtypes: datetime64[ns](1), int64(1), object(3)
memory usage: 5.2+ MB
# Because the wide range of score could make prediction more too challenging, catagrorized into two tiers: satisfied and not satisfied would help
# Create Sentiment Class
# Score 1-3: not satisfied
# Score 4-5: satisfied
df1['Satisfied'] = pd.cut(df1['Score'], bins =[0,3, float('inf')], labels =['not satisfied', 'satisfied'])
df1.iloc[::1000]
Time ProfileName Summary Text Score Satisfied
id
0 2010-08-06 08:00:00 Lisa M. Langrehr "Phillygirl" not sure if it was from this food based on reviews and ingredients ordered this ... 1 not satisfied
1000 2011-08-25 08:00:00 Gail The real thing! I first had Monin Hazelnut syrup 20 years ago ... 5 satisfied
2000 2010-07-04 08:00:00 HTBK Fantastic Food for Good Cat Health The pet food industry can be one of the most i... 5 satisfied
3000 2011-11-10 08:00:00 J. KIM Great for cold water This product is little more expensive than oth... 5 satisfied
4000 2012-07-09 08:00:00 L. Christie Excellent Product Just started my older dog on this and she LOVE... 5 satisfied
... ... ... ... ... ... ...
109000 2011-08-13 08:00:00 Rhonda Isakson taste good Product arrived quick and taste great. Not su... 5 satisfied
110000 2012-02-02 08:00:00 PJ My hair looks and feels great! I applied this coconut oil to my hair and scal... 4 satisfied
111000 2012-07-31 08:00:00 nekojita they don't last long.... Okay, I'm rating these based on my cats' react... 5 satisfied
112000 2011-08-29 08:00:00 smratguy Scam Marzano tomatoes! If you are just ordering canned tomatoes, then... 1 not satisfied
113000 2008-02-05 08:00:00 G. Little "value seeker" Blah. Tasteless. Very bland. Tastes like raspberry apricot but... 1 not satisfied

114 rows × 6 columns

#visualize the proportion of sample set with bar chart
ax = df1['Satisfied'].value_counts().plot(kind='bar',
                                    figsize=(8,8),
                                    title="Sentiment of Customer Extraced from Restaurant's reviews")
ax.set_xlabel("Sentiment of Customer")
ax.set_ylabel("Frequency")
plt.show()

Data Preprocessing

Now, we will perform some pre-processing on the data before converting it into vectors and passing it to the machine learning model.

Objective: To reduce noise, which affect the accuracy rate of model prediction. Make it more simple for model to classify.

Method:
1) Using regular expresiion to get rid off any characters which are not alphabet and unnecssary
2) convert the string to lowercase
3) get rid off stopwords i.e ‘the’, ‘an’, ‘to’; these are considres as noise which could make a model less precise
4) lemmatization: chang different form of word i.e. working -> work

#Because this step taking a long time to generate, the cleaning text should be saved separately #object of WordNetLemmatizer #processing time: around 40 min lm = WordNetLemmatizer() def text_transformation(df_col): corpus = [] for item in df_col: new_item = re.sub(‘[^a-zA-Z]’,’ ‘,str(item)) #match any characters which are not alphabet and replace with whitespace new_item = new_item.lower() # convert all to lower case new_item = new_item.split() # split each string by whitespace into a list # lemmarizing words & select only words which are not stopword in English new_item = [lm.lemmatize(word) for word in new_item if word not in set(stopwords.words(’english’))] corpus.append(’ ’.join(str(x) for x in new_item)) return corpus corpus = text_transformation(df1[‘Text’])

#Note: after cleaning text, there’s some unwanted elements still #so it’s required to used regular expression to get rid of them (
) pattern0 = r’
’ clean = [] for i in df1.text_clean: a = re.sub(pattern0, ’ ’, i) clean.append(a)

pattern1 = r’
’ clean1= [] for i in clean: b = re.sub(pattern1, ’ ’, i) clean1.append(b)

pattern2 = r’(br)’ clean2= [] for i in clean1: c = re.sub(pattern2, ’ ’, i) clean2.append(c)

saveing the file df1.to_pickle(“df1_clean.pkl”)

# showing some stopwords
print(stopwords.words('english'))
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]
df1 = pd.read_pickle("df1_clean.pkl") # reading pkl file
#show before and after cleaning
tmp =df1.iloc[::10000, [3, 6]]
with pd.option_context('display.max_colwidth', None):
  display(tmp)
Text text_clean
id
0 Keeps the cups off the counter!! It is a very well made sturdy product. It is a little stiff to pull down but but I'd rather that than falling down by it's self. We used the included screws to make sure it stays up. We are very pleased. keep cup counter well made sturdy product little stiff pull rather falling self used included screw make sure stay pleased
10000 My son loves this food. He is 16 months now and I still use them. Not all the time, but often. The reason is that his molars are coming in very quickly and he is in a lot of pain. He won't eat much when he's in pain, but these are easy on his gums. They are organic and a quick meal. My son is strong and a very healthy weight. I make sure he gets as much organic, wholesome food as possible. Buying these subscribe and save is a great way to give him good food and still save money. He loves the whole line. son love food month still use time often reason molar coming quickly lot pain eat much pain easy gum organic quick meal son strong healthy weight make sure get much organic wholesome food possible buying subscribe save great way give good food still save money love whole line
20000 Finally!! I love my Keurig, and I don't mind buying the k-cups. What does bother me is the limited availability of flavored decaf coffees. I am a coffee addict and could drink it all day if it didn't keep me up all night. I have tried every other do it yourself k cup product out there and they all stink! This product actually works. It brews a decent cup of coffee, not watered down and not a single ground of coffee in the cup. This product is worth the money! Thinking about buying a second! finally love keurig mind buying k cup bother limited availability flavored decaf coffee coffee addict could drink day keep night tried every k cup product stink product actually work brew decent cup coffee watered single ground coffee cup product worth money thinking buying second
30000 I first discovered these several years ago on a trip to San Francisco, at the Rainbow Market. They are not only vegetarian and gluten-free, they have no preservatives. However, while authentic tasting, they are not quite spicy enough for me.<br />After my young son decided to become a vegetarian, I started ordering this variety pack on a regular basis. Even though the price each is the same as it is in my local store, I can never find all six flavors at the same store at the same time, so it is worth it to order it this way. I serve these over basmati rice, and it's more than enough to feed two people. first discovered several year ago trip san francisco rainbow market vegetarian gluten free preservative however authentic tasting quite spicy enough young son decided become vegetarian started ordering variety pack regular basis even though price local store never find six flavor store time worth order way serve basmati rice enough feed two people
40000 The description on the 16oz Carousel-Sugarfree Gumball Refill Ordered from Candy Crate Inc. reads it contains a qty. of 900 gumballs. This is a lie- when you recieve this product there are only 114 servings in a bag and the serving size is considered 2 gumballs- this is a total of only 228 gumballs. To get the quantity they are claiming you would actually have to purchase about 4 bags. The gumballs themselves are fine, but beware of the fake description. Hard to write a good review when I feel a little ripped off. When purchasing online all we have to go by are the descriptions - if they are not accurate what can we base our purchase on? description oz carousel sugarfree gumball refill ordered candy crate inc read contains qty gumballs lie recieve product serving bag serving size considered gumballs total gumballs get quantity claiming would actually purchase bag gumballs fine beware fake description hard write good review feel little ripped purchasing online go description accurate base purchase
50000 Although I have never gotten it through Amazon.com, baklava made by Shatila Food Products in Michigan (www.shatila.com) is the best there is! Forget the stuff you find at the local bakery or Harry & David--this stuff is in a different league altogether. although never gotten amazon com baklava made shatila food product michigan www shatila com best forget stuff find local bakery harry david stuff different league altogether
60000 I have updated this mixed review due to the currently outrageous high shipping does NOT justify this product's value. Although This is the best gluten free sour dough bread I have sampled thus far it is much too expensive to pay upwards to $30 for 3 loaves. I have shipped baked goods to military family priority mail for a flat rate of under ten bucks in the past.<br />Company states they will reduce shipping charges in the future but that doesn't help me or others currently eager for this product.<br /><br />As for the quality/consistency of this bread, It is fluffy compared to ther bakeries and has not molded after 10 days. It toasts well and easily makes grilled sandwiches delicious. The nutritional content is as follows : 1 slice is 140 calories, no saturated or trans fats, total fat 2 grams. Cholesterol 20 mg, sodium 190 mg, total carb ( for one huge slice ) is 29 grams, fiber 1 gram , sugars 1 gram , protein 2 grams. This has definetly a great sour dough flavor and is not gritty like other gluten free baked goods from other bakeries..I only wish the shipping costs were less with this and all other gluten free baked goods. updated mixed review due currently outrageous high shipping justify product value although best gluten free sour dough bread sampled thus far much expensive pay upwards loaf shipped baked good military family priority mail flat rate ten buck past company state reduce shipping charge future help others currently eager product quality consistency bread fluffy compared ther bakery molded day toast well easily make grilled sandwich delicious nutritional content follows slice calorie saturated trans fat total fat gram cholesterol mg sodium mg total carb one huge slice gram fiber gram sugar gram protein gram definetly great sour dough flavor gritty like gluten free baked good bakery wish shipping cost le gluten free baked good
70000 I don't have a gluten intolerance - just trying to cut back on the intake of wheat/gluten...and sugar, but that's another story;) so my body feels less bloated, unhealthy and lethargic due to wheat. With that said, I do know the differences in taste between wheat pastas, rice, quinoa, corn...etc. Personally, I have come to love the taste of non-wheat pastas over wheat, with the exception of corn flour, which isn't similar enough to wheat to fool my taste buds. Annie's does a fantastic job with this rice flour product, giving it a consistency and flavor akin to Kraft mac and cheese. What makes this better is the cheese, which tastes far yummier than any boxed mac & cheese I have ever had. Also, it's real cheese, with as few bizarre ingredients as possible.<br /><br />For those who like to see ingredients, here is a comparison (and please note I am not a nutritionist - just writing a friendly review:):<br /><br />Kraft Mac & Cheese:<br />Cheese sauce mix ingredients: whey (milk protein), milk protein concentrate, milk, milkfat and cheese culture, salt, sodium tripolyphosphate, sodium phosphate and calcium phosphate, Yellow 5 and Yellow 6, citric acid, lactic acid and enzymes.<br /><br />Annie's Rice Pasta & Cheddar: cheddar cheese (cultured pasturized milk, salt, non-animal enzymes), whey, buttermilk, salt, cream, natural flavor, natural sodium phosphate, annatto extract for natural color.<br /><br />*wiki annatto extract: Annatto coloring is produced from the reddish pericarp or pulp which surrounds the seed of the achiote (Bixa orellana L.). It is used in many natural cheeses (e.g., Cheddar, Red Leicester, Gouda (cheese) and Brie), margarine, butter, rice, smoked fish, and custard powder.<br /><br />Annie's also has less sodium and sugars, which I am grateful for.<br /><br />Also, Annie's does make another rice pasta mac and cheese - it's a deluxe box. This is what I would compare to Velveeta - for you lovers out there. It's the ooey gooey cheese that is thicker. Personally, I detest Velveeta, so the deluxe isn't as awesome as the simple Rice Pasta & Cheddar. But the deluxe IS better than macaroni with Velveeta because the consistency of the cheese isn't ridiculous overbearing and throat-clogging as Velveeta. I swear, I always felt like I would suffocate eating that stuff!<br /><br />I definitely recommend this product to those with allergies, and intolerance, or those like myself who are looking for ways to significantly reduce heat intake. My entire family has now switched from Kraft over to this product (and they didn't do it for health reasons - they simply prefer the taste!)<br /><br />It's a bit more pricey, I'll give you that. But for a hint - do check Target occasionally. They sell Annie's pastas and some amazing organic bunny fruit snacks - all of which go on sale quite often (I just purchased Rice Pasta & Cheddar for $1 a box!) If only the prices were always so kind;) gluten intolerance trying cut back intake wheat gluten sugar another story body feel le bloated unhealthy lethargic due wheat said know difference taste wheat pasta rice quinoa corn etc personally come love taste non wheat pasta wheat exception corn flour similar enough wheat fool taste bud annie fantastic job rice flour product giving consistency flavor akin kraft mac cheese make better cheese taste far yummier boxed mac cheese ever also real cheese bizarre ingredient possible like see ingredient comparison please note nutritionist writing friendly review kraft mac cheese cheese sauce mix ingredient whey milk protein milk protein concentrate milk milkfat cheese culture salt sodium tripolyphosphate sodium phosphate calcium phosphate yellow yellow citric acid lactic acid enzyme annie rice pasta cheddar cheddar cheese cultured pasturized milk salt non animal enzyme whey buttermilk salt cream natural flavor natural sodium phosphate annatto extract natural color wiki annatto extract annatto coloring produced reddish pericarp pulp surround seed achiote bixa orellana l used many natural cheese e g cheddar red leicester gouda cheese brie margarine butter rice smoked fish custard powder annie also le sodium sugar grateful also annie make another rice pasta mac cheese deluxe box would compare velveeta lover ooey gooey cheese thicker personally detest velveeta deluxe awesome simple rice pasta cheddar deluxe better macaroni velveeta consistency cheese ridiculous overbearing throat clogging velveeta swear always felt like would suffocate eating stuff definitely recommend product allergy intolerance like looking way significantly reduce heat intake entire family switched kraft product health reason simply prefer taste bit pricey give hint check target occasionally sell annie pasta amazing organic bunny fruit snack go sale quite often purchased rice pasta cheddar box price always kind
80000 This drink tastes good. I enjoyed it. I also had my daughter and my grandchildren try it--they drank even more of it and found it pleasant to the taste. Once mixed and cold in the fridge, it didn't last long. A good alternative to pop I think.<br /><br />Recommended. drink taste good enjoyed also daughter grandchild try drank even found pleasant taste mixed cold fridge last long good alternative pop think recommended
90000 I quite enjoyed these cookies. They are reminicent of a shortbread cookie with a hint of orange and some chewy Crasins thrown in for good measure. Fairly reasonable stats for a cookie (140 calories, 5 grams of total fat and 7 sugars)---until you see that is only for THREE cookies. No way you're going to hold yourself to 3 lousy cookies in one sitting so you'd better plan on doubling that. But it is still a better choice that a lot of offerings in the cookie isle. And that is just where I'd go to purchase these. They didn't hold up well in shipping and I wound up with a lot of crumbs. Which I ate anyway. Because they were too yummy to let that stop me. Enjoy! quite enjoyed cooky reminicent shortbread cookie hint orange chewy crasins thrown good measure fairly reasonable stats cookie calorie gram total fat sugar see three cooky way going hold lousy cooky one sitting better plan doubling still better choice lot offering cookie isle go purchase hold well shipping wound lot crumb ate anyway yummy let stop enjoy
100000 A very nutritrious and delicious soup from Amy's not offered in the organic sections of grocery stores in my area of the U.S. But about a quarter of the cans in the case were dented, and therefore, not acceptable for long term storage.<br /> I wouldn't buy a dented can from a store, and therefore am dismayed that Amazon would ship damaged goods.<br /> If Amazon is getting a good price on this product because the cans are dented already, the product should be advertized as such. I don't like being sent a case of canned goods with the cans in the middle of the case crushed. What's up with that?<br /> I have been satisfied with the condition of other canned goods bought vie Amazon - but buyer beware.<br /> It is not worth my time to complain and return.<br /> But I won't buy Amy's soups through Amazon again, and in the future, will REALLY question whether buying ANY canned goods through Amazon is worth it - even if the price is right - considering that the goods may or may not arrive damaged.<br /> Hey Amazon, Honesty is the best policy. It's not a "deal" if you send me damaged goods. nutritrious delicious soup amy offered organic section grocery store area u quarter can case dented therefore acceptable long term storage buy dented store therefore dismayed amazon would ship damaged good amazon getting good price product can dented already product advertized like sent case canned good can middle case crushed satisfied condition canned good bought vie amazon buyer beware worth time complain return buy amy soup amazon future really question whether buying canned good amazon worth even price right considering good may may arrive damaged hey amazon honesty best policy deal send damaged good
110000 There are several things a coffee lover looks for in their brew.. the aroma, the color and the taste are what I look for. When I opened the individual pack, I was hit with a wonderful coffee scent. The pod looks typical, and are made for a pod machine. When I made my first cup, since the aroma was strong, I filled the machine with a good sized mug's worth of water, and made a cup. When it was done, the color was fairly light, so I only added a small amount of milk. Still, the flavor was a bit too bland for me, and I like a mild coffee. For the second cup, I used a smaller mug, and in return got a darker, more flavorful cup, so that is my recommendation with this brand. Some other things I love about this brand: it's organic, sustainably grown and Fair Trade certified. That's a lot of benefits for only about 75 cents a cup. The even have their own foundation to support youth soccer programs. This is a good deal. The only thing that I would change is the individual wrap, which seems like an excess of packaging for a company dedicated to the environment. several thing coffee lover look brew aroma color taste look opened individual pack hit wonderful coffee scent pod look typical made pod machine made first cup since aroma strong filled machine good sized mug worth water made cup done color fairly light added small amount milk still flavor bit bland like mild coffee second cup used smaller mug return got darker flavorful cup recommendation brand thing love brand organic sustainably grown fair trade certified lot benefit cent cup even foundation support youth soccer program good deal thing would change individual wrap seems like excess packaging company dedicated environment

WordClound

  • using wordclound to find the most frequency of word being used in review
  • it is required to convert pandas data serie (text_clean column) into a long string in a variable

note that the result is just the long string in a variabal, which we need to pass that to a wordclund object

#Processing Time: 10 min
#preparing data for wordcloud visualization
word = df1['text_clean']
comment_words = ""  # create empty string variable

i=0
j=0

#loop to each row in corpus and append them to comment_words variable
while j <= len(word)-1: #setting number of counter equal to number of observation -1, otherwise, out of inde
    i = word[j]
    comment_words +="".join(i) # for each word append into comment_words variable
    j = j+1 # increae the counter
len(comment_words)
29091504
type(comment_words)
str
comment_words[0:1000]
'keep cup counter well made sturdy product little stiff pull rather falling self used included screw make sure stay pleasedbar pretty good taste like cinnamon apple pie imho texture surprise pretty mushy thick bit almond throughout chewy taste real fruit inside think different dried fruit pressed together first time trying bar like part new diet like minimal ingredient used chems ate trio brand bar along side one comparison reason star personal preference like nutty crunchy texture trio brand overall think good bar high cals pretty good nutrient looking good tasting vegan bar good one trylove pantry cook batch rice add sauce dinner served year old son love asks several time weekused another brand tonkotsu flavor noodle imported japan year ago u stopped importing japan reason happy finally found product made hong kong family love flavor taste good japanese tonkotsu noodle strongly recommended enjoyherr favorite chip brand fan salsa love chipabsolutely good french vanilla cappuccino bough'
# passing all parameter and 'comment_words' variable, which we generate from previous step
wordcloud = WordCloud(width = 1500, height = 1500,background_color ='white',min_font_size = 10).generate(comment_words)
plt.figure(figsize=(15, 10))
plt.imshow(wordcloud)
plt.title('High Frequency of Words Found in Customer Reviews')
#plt.savefig('wordclound.png') # set the file to png.
plt.show()

do the visualization with heatmap Assumption: different score review should have different position in vector space so we will utilize heat map to answer the question that the reviews with different range of score are really different in vector space

A short note of what is Word Embedding

Word Embedding Word Embeddings are the texts converted into numbers and there may be different numerical representations of the same text In short, we can say that to build any model in machine learning or deep learning, the final level data has to be in numerical form because models don’t understand text or image data directly as humans do. Therefore, Vectorization or word embedding is the process of converting text data to numerical vectors. Later those vectors are used to build various machine learning models. In this manner, we say this as extracting features with the help of text with an aim to build multiple natural languages, processing models, etc. We have different ways to convert the text data to numerical vectors which we will discuss in this article later. Broadly, we can classified word embeddings into the following two categories: Frequency-based or Statistical based Word Embedding Prediction based Word Embedding

catagorize reviews into two groups: score 4 and 5, score <= 3

#filter only text_clen which score = 5 
filter0 = df1['Score'] == 5
score_5 = df1[filter0]

#filter only text_clen which score <4
filter1 = df1['Score'] < 4
score_1to3 = df1[filter1]
score_5 = score_5[['Time','ProfileName','text_clean','Score']].iloc[0:500] #must be in the same shape
score_1to3 = score_1to3[['Time','ProfileName','text_clean','Score']].iloc[0:500] #must be in the same shape
tf_score5 = score_5
score_5
Time ProfileName text_clean Score
id
0 2012-02-11 08:00:00 cac Idaho keep cup counter well made sturdy product litt... 5
2 2008-10-16 08:00:00 Auskan "Auskan" love pantry cook batch rice add sauce dinner s... 5
3 2012-08-24 08:00:00 chicago used another brand tonkotsu flavor noodle impo... 5
4 2010-07-13 08:00:00 you suckkk herr favorite chip brand fan salsa love chip 5
5 2012-03-15 08:00:00 Donna absolutely good french vanilla cappuccino boug... 5
... ... ... ... ...
787 2010-12-09 08:00:00 Erika new favorite snack food whenever craving sweet... 5
789 2007-03-09 08:00:00 J. Lamar prepared kit basic add shrimp anything red pep... 5
792 2011-11-21 08:00:00 JVR Mom month old daughter love formula mixing issue t... 5
793 2011-01-25 08:00:00 Stacy "sllemke" best ever candy person sweet general however f... 5
795 2009-03-03 08:00:00 O. Vinogradova "jaded mouse" love treat training small tasty least dog seem... 5

500 rows × 4 columns

score_1to3
Time ProfileName text_clean Score
id
7 2011-03-12 08:00:00 CANDICE pop nice never get taste like movie theater po... 2
10 2011-08-25 08:00:00 Light by the Moon ordered birthday got birthday money family ord... 2
15 2009-05-25 08:00:00 MamavanMNE candy good taste seem made natural ingredient ... 2
16 2011-05-11 08:00:00 Dr. M. A. Dixon "hyper-observant" tea taste like blend ingredient listed taste l... 3
25 2012-06-15 08:00:00 annie "grannieannie" put enough creamer coffee tolerable good coffe... 2
... ... ... ... ...
2221 2009-09-29 08:00:00 Robert Y. Lamaute "blamaute" light bright florescent bulb wattage look nice... 3
2246 2012-04-03 08:00:00 Lindsay Pasch "VaBookworm87" come conclusion big fan thing definitely say m... 3
2250 2011-02-08 08:00:00 Robert C. Reade "Random buyer" ordered coffee another brand seems three week ... 1
2259 2006-11-10 08:00:00 Kate bought amazon becuase disappeared real store c... 3
2260 2010-07-18 08:00:00 Steven Meuse lucky stock canister last summer three left wr... 1

500 rows × 4 columns

Transform Text to Vector

# transform those text into vectors, which is actuall appeared in sparse matrix
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
tf_score5  = count_vect.fit_transform(score_5['text_clean'])
tf_score5
tf_score1to3  = count_vect.fit_transform(score_1to3['text_clean'])
tf_score1to3
<500x4287 sparse matrix of type '<class 'numpy.int64'>'
    with 18782 stored elements in Compressed Sparse Row format>
# check its shape
tf_score1to3.shape
(500, 4287)

Cosine Similarity

After some kind of transforming text to vector, we need to reshape sparse matrix so we can use a coins_similarity function to generate its cosine similarity. Cosine Similarity is one of the method to measure the distance of different data points in vector space and , in our case, we will implement that and visualize cosine similarity of those reviews with heat map.

Reshape Sparse matrix

tf_score5=tf_score5[0:500, 0:3511].toarray() #reshape sparse matrix
tf_score1to3=tf_score1to3[0:500, 0:3511].toarray()#reshape sparse matrix
tf_score1to3
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

Get the Cosine Score

#get the cosine score
from sklearn.metrics.pairwise import cosine_similarity 
cosinescore = cosine_similarity(tf_score5 ,tf_score1to3)
cosinescore
array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.04783649, 0.        ,
        0.03181424],
       [0.        , 0.01756821, 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.02179068, 0.03846154, ..., 0.        , 0.        ,
        0.        ]])

Heatmap

# the size on data which will be displayed in heat map
plot_z = cosinescore[0:40, 0:40]
# generate heat map of review 1-3 VS 4-5 scores
import seaborn as sns

df_todraw = pd.DataFrame(plot_z)
plt.subplots(figsize=(20, 15))
ax = sns.heatmap(df_todraw,
                 cmap="YlGnBu",
                 vmin=0, vmax=1, annot=True, fmt='.1f')
plt.show()

#Note: the heat map showing us that there's less similarity between these two group of review, which it is supposed to be like that
#because they have significanet different range of score.

compare score 5 to another 5 score review

cosinescore5 = cosine_similarity(tf_score5 ,tf_score5)
cosinescore5
array([[1.        , 0.        , 0.10606602, ..., 0.05      , 0.        ,
        0.0438529 ],
       [0.        , 1.        , 0.10882144, ..., 0.15389675, 0.086711  ,
        0.08998425],
       [0.10606602, 0.10882144, 1.        , ..., 0.10606602, 0.02988072,
        0.09302605],
       ...,
       [0.05      , 0.15389675, 0.10606602, ..., 1.        , 0.08451543,
        0.0438529 ],
       [0.        , 0.086711  , 0.02988072, ..., 0.08451543, 1.        ,
        0.03706247],
       [0.0438529 , 0.08998425, 0.09302605, ..., 0.0438529 , 0.03706247,
        1.        ]])
plot_zz = cosinescore5[0:40, 41:81]
plot_x = list(range(41,81))
import seaborn as sns

df_todraw2 = pd.DataFrame(plot_zz, columns = plot_x)
plt.subplots(figsize=(20, 15))
ax = sns.heatmap(df_todraw2,
                 cmap="YlGnBu",
                 vmin=0, vmax=1, annot=True, fmt='.1f')
plt.show()

#Note: While comparing between those review with only 5 score, they show the likelihood of being similar more.

pulling out some review which has high correlation and see how they being similar

score_5.iloc[3,2]
'herr favorite chip brand fan salsa love chip'
score_5.iloc[60,2]
'cannot tolerate extremely hot spicy chip like zing crunch chip made enjoyment purchased around holiday truly enjoyed many guest yes purchasing chip really good organic affordable'
score_5.iloc[33,2]
'love arizona green tea ginseng drink time simple easy carry packet purse'
score_5.iloc[78,2]
'love love green tea hard find area place internet charge big price usually get many box merchant definitely order seller thanks depend green tea fix everyday'
score_5.iloc[36,2]
'cereal tasty healthy spice bit good add banana walnut coconut shaving mmm good'
score_5.iloc[37,2] # score 0.0, this review is about cereal
'love chip longer crave regular potato chip tasty crunchy alot salt tho'

Result: those pairs ,which receive cosine similarity at 0.4 and 0.3, are all good revew about chip

#displaying some data
tmp =df1.iloc[::10000, [3, 6]]
with pd.option_context('display.max_colwidth', None):
  display(tmp)
Text text_clean
id
0 Keeps the cups off the counter!! It is a very well made sturdy product. It is a little stiff to pull down but but I'd rather that than falling down by it's self. We used the included screws to make sure it stays up. We are very pleased. keep cup counter well made sturdy product little stiff pull rather falling self used included screw make sure stay pleased
10000 My son loves this food. He is 16 months now and I still use them. Not all the time, but often. The reason is that his molars are coming in very quickly and he is in a lot of pain. He won't eat much when he's in pain, but these are easy on his gums. They are organic and a quick meal. My son is strong and a very healthy weight. I make sure he gets as much organic, wholesome food as possible. Buying these subscribe and save is a great way to give him good food and still save money. He loves the whole line. son love food month still use time often reason molar coming quickly lot pain eat much pain easy gum organic quick meal son strong healthy weight make sure get much organic wholesome food possible buying subscribe save great way give good food still save money love whole line
20000 Finally!! I love my Keurig, and I don't mind buying the k-cups. What does bother me is the limited availability of flavored decaf coffees. I am a coffee addict and could drink it all day if it didn't keep me up all night. I have tried every other do it yourself k cup product out there and they all stink! This product actually works. It brews a decent cup of coffee, not watered down and not a single ground of coffee in the cup. This product is worth the money! Thinking about buying a second! finally love keurig mind buying k cup bother limited availability flavored decaf coffee coffee addict could drink day keep night tried every k cup product stink product actually work brew decent cup coffee watered single ground coffee cup product worth money thinking buying second
30000 I first discovered these several years ago on a trip to San Francisco, at the Rainbow Market. They are not only vegetarian and gluten-free, they have no preservatives. However, while authentic tasting, they are not quite spicy enough for me.<br />After my young son decided to become a vegetarian, I started ordering this variety pack on a regular basis. Even though the price each is the same as it is in my local store, I can never find all six flavors at the same store at the same time, so it is worth it to order it this way. I serve these over basmati rice, and it's more than enough to feed two people. first discovered several year ago trip san francisco rainbow market vegetarian gluten free preservative however authentic tasting quite spicy enough young son decided become vegetarian started ordering variety pack regular basis even though price local store never find six flavor store time worth order way serve basmati rice enough feed two people
40000 The description on the 16oz Carousel-Sugarfree Gumball Refill Ordered from Candy Crate Inc. reads it contains a qty. of 900 gumballs. This is a lie- when you recieve this product there are only 114 servings in a bag and the serving size is considered 2 gumballs- this is a total of only 228 gumballs. To get the quantity they are claiming you would actually have to purchase about 4 bags. The gumballs themselves are fine, but beware of the fake description. Hard to write a good review when I feel a little ripped off. When purchasing online all we have to go by are the descriptions - if they are not accurate what can we base our purchase on? description oz carousel sugarfree gumball refill ordered candy crate inc read contains qty gumballs lie recieve product serving bag serving size considered gumballs total gumballs get quantity claiming would actually purchase bag gumballs fine beware fake description hard write good review feel little ripped purchasing online go description accurate base purchase
50000 Although I have never gotten it through Amazon.com, baklava made by Shatila Food Products in Michigan (www.shatila.com) is the best there is! Forget the stuff you find at the local bakery or Harry & David--this stuff is in a different league altogether. although never gotten amazon com baklava made shatila food product michigan www shatila com best forget stuff find local bakery harry david stuff different league altogether
60000 I have updated this mixed review due to the currently outrageous high shipping does NOT justify this product's value. Although This is the best gluten free sour dough bread I have sampled thus far it is much too expensive to pay upwards to $30 for 3 loaves. I have shipped baked goods to military family priority mail for a flat rate of under ten bucks in the past.<br />Company states they will reduce shipping charges in the future but that doesn't help me or others currently eager for this product.<br /><br />As for the quality/consistency of this bread, It is fluffy compared to ther bakeries and has not molded after 10 days. It toasts well and easily makes grilled sandwiches delicious. The nutritional content is as follows : 1 slice is 140 calories, no saturated or trans fats, total fat 2 grams. Cholesterol 20 mg, sodium 190 mg, total carb ( for one huge slice ) is 29 grams, fiber 1 gram , sugars 1 gram , protein 2 grams. This has definetly a great sour dough flavor and is not gritty like other gluten free baked goods from other bakeries..I only wish the shipping costs were less with this and all other gluten free baked goods. updated mixed review due currently outrageous high shipping justify product value although best gluten free sour dough bread sampled thus far much expensive pay upwards loaf shipped baked good military family priority mail flat rate ten buck past company state reduce shipping charge future help others currently eager product quality consistency bread fluffy compared ther bakery molded day toast well easily make grilled sandwich delicious nutritional content follows slice calorie saturated trans fat total fat gram cholesterol mg sodium mg total carb one huge slice gram fiber gram sugar gram protein gram definetly great sour dough flavor gritty like gluten free baked good bakery wish shipping cost le gluten free baked good
70000 I don't have a gluten intolerance - just trying to cut back on the intake of wheat/gluten...and sugar, but that's another story;) so my body feels less bloated, unhealthy and lethargic due to wheat. With that said, I do know the differences in taste between wheat pastas, rice, quinoa, corn...etc. Personally, I have come to love the taste of non-wheat pastas over wheat, with the exception of corn flour, which isn't similar enough to wheat to fool my taste buds. Annie's does a fantastic job with this rice flour product, giving it a consistency and flavor akin to Kraft mac and cheese. What makes this better is the cheese, which tastes far yummier than any boxed mac & cheese I have ever had. Also, it's real cheese, with as few bizarre ingredients as possible.<br /><br />For those who like to see ingredients, here is a comparison (and please note I am not a nutritionist - just writing a friendly review:):<br /><br />Kraft Mac & Cheese:<br />Cheese sauce mix ingredients: whey (milk protein), milk protein concentrate, milk, milkfat and cheese culture, salt, sodium tripolyphosphate, sodium phosphate and calcium phosphate, Yellow 5 and Yellow 6, citric acid, lactic acid and enzymes.<br /><br />Annie's Rice Pasta & Cheddar: cheddar cheese (cultured pasturized milk, salt, non-animal enzymes), whey, buttermilk, salt, cream, natural flavor, natural sodium phosphate, annatto extract for natural color.<br /><br />*wiki annatto extract: Annatto coloring is produced from the reddish pericarp or pulp which surrounds the seed of the achiote (Bixa orellana L.). It is used in many natural cheeses (e.g., Cheddar, Red Leicester, Gouda (cheese) and Brie), margarine, butter, rice, smoked fish, and custard powder.<br /><br />Annie's also has less sodium and sugars, which I am grateful for.<br /><br />Also, Annie's does make another rice pasta mac and cheese - it's a deluxe box. This is what I would compare to Velveeta - for you lovers out there. It's the ooey gooey cheese that is thicker. Personally, I detest Velveeta, so the deluxe isn't as awesome as the simple Rice Pasta & Cheddar. But the deluxe IS better than macaroni with Velveeta because the consistency of the cheese isn't ridiculous overbearing and throat-clogging as Velveeta. I swear, I always felt like I would suffocate eating that stuff!<br /><br />I definitely recommend this product to those with allergies, and intolerance, or those like myself who are looking for ways to significantly reduce heat intake. My entire family has now switched from Kraft over to this product (and they didn't do it for health reasons - they simply prefer the taste!)<br /><br />It's a bit more pricey, I'll give you that. But for a hint - do check Target occasionally. They sell Annie's pastas and some amazing organic bunny fruit snacks - all of which go on sale quite often (I just purchased Rice Pasta & Cheddar for $1 a box!) If only the prices were always so kind;) gluten intolerance trying cut back intake wheat gluten sugar another story body feel le bloated unhealthy lethargic due wheat said know difference taste wheat pasta rice quinoa corn etc personally come love taste non wheat pasta wheat exception corn flour similar enough wheat fool taste bud annie fantastic job rice flour product giving consistency flavor akin kraft mac cheese make better cheese taste far yummier boxed mac cheese ever also real cheese bizarre ingredient possible like see ingredient comparison please note nutritionist writing friendly review kraft mac cheese cheese sauce mix ingredient whey milk protein milk protein concentrate milk milkfat cheese culture salt sodium tripolyphosphate sodium phosphate calcium phosphate yellow yellow citric acid lactic acid enzyme annie rice pasta cheddar cheddar cheese cultured pasturized milk salt non animal enzyme whey buttermilk salt cream natural flavor natural sodium phosphate annatto extract natural color wiki annatto extract annatto coloring produced reddish pericarp pulp surround seed achiote bixa orellana l used many natural cheese e g cheddar red leicester gouda cheese brie margarine butter rice smoked fish custard powder annie also le sodium sugar grateful also annie make another rice pasta mac cheese deluxe box would compare velveeta lover ooey gooey cheese thicker personally detest velveeta deluxe awesome simple rice pasta cheddar deluxe better macaroni velveeta consistency cheese ridiculous overbearing throat clogging velveeta swear always felt like would suffocate eating stuff definitely recommend product allergy intolerance like looking way significantly reduce heat intake entire family switched kraft product health reason simply prefer taste bit pricey give hint check target occasionally sell annie pasta amazing organic bunny fruit snack go sale quite often purchased rice pasta cheddar box price always kind
80000 This drink tastes good. I enjoyed it. I also had my daughter and my grandchildren try it--they drank even more of it and found it pleasant to the taste. Once mixed and cold in the fridge, it didn't last long. A good alternative to pop I think.<br /><br />Recommended. drink taste good enjoyed also daughter grandchild try drank even found pleasant taste mixed cold fridge last long good alternative pop think recommended
90000 I quite enjoyed these cookies. They are reminicent of a shortbread cookie with a hint of orange and some chewy Crasins thrown in for good measure. Fairly reasonable stats for a cookie (140 calories, 5 grams of total fat and 7 sugars)---until you see that is only for THREE cookies. No way you're going to hold yourself to 3 lousy cookies in one sitting so you'd better plan on doubling that. But it is still a better choice that a lot of offerings in the cookie isle. And that is just where I'd go to purchase these. They didn't hold up well in shipping and I wound up with a lot of crumbs. Which I ate anyway. Because they were too yummy to let that stop me. Enjoy! quite enjoyed cooky reminicent shortbread cookie hint orange chewy crasins thrown good measure fairly reasonable stats cookie calorie gram total fat sugar see three cooky way going hold lousy cooky one sitting better plan doubling still better choice lot offering cookie isle go purchase hold well shipping wound lot crumb ate anyway yummy let stop enjoy
100000 A very nutritrious and delicious soup from Amy's not offered in the organic sections of grocery stores in my area of the U.S. But about a quarter of the cans in the case were dented, and therefore, not acceptable for long term storage.<br /> I wouldn't buy a dented can from a store, and therefore am dismayed that Amazon would ship damaged goods.<br /> If Amazon is getting a good price on this product because the cans are dented already, the product should be advertized as such. I don't like being sent a case of canned goods with the cans in the middle of the case crushed. What's up with that?<br /> I have been satisfied with the condition of other canned goods bought vie Amazon - but buyer beware.<br /> It is not worth my time to complain and return.<br /> But I won't buy Amy's soups through Amazon again, and in the future, will REALLY question whether buying ANY canned goods through Amazon is worth it - even if the price is right - considering that the goods may or may not arrive damaged.<br /> Hey Amazon, Honesty is the best policy. It's not a "deal" if you send me damaged goods. nutritrious delicious soup amy offered organic section grocery store area u quarter can case dented therefore acceptable long term storage buy dented store therefore dismayed amazon would ship damaged good amazon getting good price product can dented already product advertized like sent case canned good can middle case crushed satisfied condition canned good bought vie amazon buyer beware worth time complain return buy amy soup amazon future really question whether buying canned good amazon worth even price right considering good may may arrive damaged hey amazon honesty best policy deal send damaged good
110000 There are several things a coffee lover looks for in their brew.. the aroma, the color and the taste are what I look for. When I opened the individual pack, I was hit with a wonderful coffee scent. The pod looks typical, and are made for a pod machine. When I made my first cup, since the aroma was strong, I filled the machine with a good sized mug's worth of water, and made a cup. When it was done, the color was fairly light, so I only added a small amount of milk. Still, the flavor was a bit too bland for me, and I like a mild coffee. For the second cup, I used a smaller mug, and in return got a darker, more flavorful cup, so that is my recommendation with this brand. Some other things I love about this brand: it's organic, sustainably grown and Fair Trade certified. That's a lot of benefits for only about 75 cents a cup. The even have their own foundation to support youth soccer programs. This is a good deal. The only thing that I would change is the individual wrap, which seems like an excess of packaging for a company dedicated to the environment. several thing coffee lover look brew aroma color taste look opened individual pack hit wonderful coffee scent pod look typical made pod machine made first cup since aroma strong filled machine good sized mug worth water made cup done color fairly light added small amount milk still flavor bit bland like mild coffee second cup used smaller mug return got darker flavorful cup recommendation brand thing love brand organic sustainably grown fair trade certified lot benefit cent cup even foundation support youth soccer program good deal thing would change individual wrap seems like excess packaging company dedicated environment

showing the proportion of our review catagorized by ‘satisfied’ and ‘not satisfied’ labels

ax = df1['Satisfied'].value_counts().plot(kind='bar',
                                    figsize=(8,8),
                                    title="Sentiment of Customer Extraced from Restaurant's reviews")
ax.set_xlabel("Sentiment of Customer")
ax.set_ylabel("Frequency")
plt.show()

Because the review of cutomer comprise of ‘satisfied review’ more thatn ‘not satisfied review’ significantly, this could lead to ‘Imbalanced of sentimental class’,which might affect model to be biased. However, in this report, dealing with that issue is out of scope so we will randome pick samples from both group in the equal amount

Get a sample set

Funtion to get sample set

# function to get sample set from review of customer with the same amount
# the goal of doing thing because we want to eliminate the imbalancing of data set
def get_top_data(top_n = 20000):
    top_data_df_positive = df1[df1['Satisfied'] == 'satisfied'].head(top_n)
    top_data_df_negative = df1[df1['Satisfied'] == 'not satisfied'].head(top_n)
    top_data_df_small = pd.concat([top_data_df_positive, top_data_df_negative])
    return top_data_df_small
# extract 20,000 each
df2 = get_top_data(top_n=20000)
ax = df2['Satisfied'].value_counts().plot(kind='bar',
                                    figsize=(8,8),
                                    title="Sentiment of Customer Extraced from Restaurant's reviews")
ax.set_xlabel("Sentiment of Customer")
ax.set_ylabel("Frequency")
plt.show()

#the problem of imbalancing data set is gone

#Tokenization
#seperate text into single word and this will help when transforming text to numeric value

from gensim.utils import simple_preprocess
# Tokenize the text column to get the new column 'tokenized_text'
df2['tokenized_text'] = [simple_preprocess(line, deacc=True) for line in df2['text_clean']] 
print(df2['tokenized_text'].head(10))
id
0     [keep, cup, counter, well, made, sturdy, produ...
1     [bar, pretty, good, taste, like, cinnamon, app...
2     [love, pantry, cook, batch, rice, add, sauce, ...
3     [used, another, brand, tonkotsu, flavor, noodl...
4     [herr, favorite, chip, brand, fan, salsa, love...
5     [absolutely, good, french, vanilla, cappuccino...
6     [cheaper, chain, cup, make, home, stuff, aweso...
8     [bought, coffee, amazon, special, promotion, g...
9     [dog, love, zuke, treat, one, acceptation, lik...
11    [cereal, like, chex, healthier, outstanding, f...
Name: tokenized_text, dtype: object
[col for col in df2]
['Time',
 'ProfileName',
 'Summary',
 'Text',
 'Score',
 'Satisfied',
 'text_clean',
 'tokenized_text']
from gensim.parsing.porter import PorterStemmer
porter_stemmer = PorterStemmer()
# Get the stemmed_tokens
df2['stemmed_tokens'] = [[porter_stemmer.stem(word) for word in tokens] for tokens in df2['tokenized_text'] ]
df2['stemmed_tokens'].head(10)
id
0     [keep, cup, counter, well, made, sturdi, produ...
1     [bar, pretti, good, tast, like, cinnamon, appl...
2     [love, pantri, cook, batch, rice, add, sauc, d...
3     [us, anoth, brand, tonkotsu, flavor, noodl, im...
4     [herr, favorit, chip, brand, fan, salsa, love,...
5     [absolut, good, french, vanilla, cappuccino, b...
6     [cheaper, chain, cup, make, home, stuff, aweso...
8     [bought, coffe, amazon, special, promot, go, e...
9     [dog, love, zuke, treat, on, accept, like, muc...
11    [cereal, like, chex, healthier, outstand, flav...
Name: stemmed_tokens, dtype: object
tmp =df2.iloc[::2000, [6, 7]]
with pd.option_context('display.max_colwidth', None):
  display(tmp)
text_clean tokenized_text
id
0 keep cup counter well made sturdy product little stiff pull rather falling self used included screw make sure stay pleased [keep, cup, counter, well, made, sturdy, product, little, stiff, pull, rather, falling, self, used, included, screw, make, sure, stay, pleased]
2572 dog easy finding treat one fit bill two aussie shepherd get full piece min pin chihuahua mix get cut half love always look forward special treat [dog, easy, finding, treat, one, fit, bill, two, aussie, shepherd, get, full, piece, min, pin, chihuahua, mix, get, cut, half, love, always, look, forward, special, treat]
5116 reading review confused think anyone talking tea sounded like comment pertained prince peace green tea anything instant dong quai red date tea clicked said remarkable tea delicious instant wait steep really good love taste enjoy uniquely bitter flavor dong quai bitter yummy know dong quai considered ginseng woman beceause high vit b help keep woman becoming anemic due monthly cycle also said help regulate irregular period used daily basis said tea special help draw one energy downwards red date add effect red color st chakra word tea aphrodisiac quality create pleasnt feeling taken bed good taste good snap make quibbling [reading, review, confused, think, anyone, talking, tea, sounded, like, comment, pertained, prince, peace, green, tea, anything, instant, dong, quai, red, date, tea, clicked, said, remarkable, tea, delicious, instant, wait, steep, really, good, love, taste, enjoy, uniquely, bitter, flavor, dong, quai, bitter, yummy, know, dong, quai, considered, ginseng, woman, beceause, high, vit, help, keep, woman, becoming, anemic, due, monthly, cycle, also, said, help, regulate, irregular, period, used, daily, basis, said, tea, special, help, draw, one, energy, downwards, red, date, add, effect, red, color, st, chakra, word, tea, aphrodisiac, quality, create, pleasnt, feeling, taken, bed, good, taste, good, snap, make, quibbling]
7676 wow find avid latte drinker refuse pay outlandish price local coffee shop purchased machine couple year ago find supplier using flavor add coffee going business thankfully amazon com came rescue get convienence flavor delivered home paying le per bottle delivered thank amazon com loyal customer illinois [wow, find, avid, latte, drinker, refuse, pay, outlandish, price, local, coffee, shop, purchased, machine, couple, year, ago, find, supplier, using, flavor, add, coffee, going, business, thankfully, amazon, com, came, rescue, get, convienence, flavor, delivered, home, paying, le, per, bottle, delivered, thank, amazon, com, loyal, customer, illinois]
10212 mo old simply love stuff st official finger food think combo plus taste good dissolve easy mouth kind important teeth mom comment wish little green even close green color like waved green bowl making someone ought sell stuff mixed case go thru one container every day [mo, old, simply, love, stuff, st, official, finger, food, think, combo, plus, taste, good, dissolve, easy, mouth, kind, important, teeth, mom, comment, wish, little, green, even, close, green, color, like, waved, green, bowl, making, someone, ought, sell, stuff, mixed, case, go, thru, one, container, every, day]
12784 extremely better wilton buy never use nasty stuff actually edible unlike product easy work [extremely, better, wilton, buy, never, use, nasty, stuff, actually, edible, unlike, product, easy, work]
15326 using french market coffee many year moving guam store brought sent went mainland last year thrilled find could order amazon reasonable price automatic shipment coffee never bitter due chicory robust tasty use anything else [using, french, market, coffee, many, year, moving, guam, store, brought, sent, went, mainland, last, year, thrilled, find, could, order, amazon, reasonable, price, automatic, shipment, coffee, never, bitter, due, chicory, robust, tasty, use, anything, else]
17900 dog href http www amazon com gp product b j jkgo canidae dry dog food lamb meal brown rice formula pound bag year recently added diet stool firm seems like crazy taste texture little thick even mixed water [dog, href, http, www, amazon, com, gp, product, jkgo, canidae, dry, dog, food, lamb, meal, brown, rice, formula, pound, bag, year, recently, added, diet, stool, firm, seems, like, crazy, taste, texture, little, thick, even, mixed, water]
20427 santa cruz soft baked oatmeal raisin cookie one best ever flavor wonderful spice make think eating holiday pastry put plate cooky always first go call adult cookie child love price good delivered door could ask whole line cooky wonderful try find youself happy eating [santa, cruz, soft, baked, oatmeal, raisin, cookie, one, best, ever, flavor, wonderful, spice, make, think, eating, holiday, pastry, put, plate, cooky, always, first, go, call, adult, cookie, child, love, price, good, delivered, door, could, ask, whole, line, cooky, wonderful, try, find, youself, happy, eating]
23008 husband love tea drink antioxidant content difficulty finding favorite grocery store simply order amazon [husband, love, tea, drink, antioxidant, content, difficulty, finding, favorite, grocery, store, simply, order, amazon]
7 pop nice never get taste like movie theater popcorn even come close gave star popping taste [pop, nice, never, get, taste, like, movie, theater, popcorn, even, come, close, gave, star, popping, taste]
9236 href http www amazon com gp product b la vegetable base first bought product heb randalls sauce bought product get sauce like well used anyway good taste change buy would [href, http, www, amazon, com, gp, product, la, vegetable, base, first, bought, product, heb, randalls, sauce, bought, product, get, sauce, like, well, used, anyway, good, taste, change, buy, would]
18415 liquid v fish v flavor opened expecting large amount liquid ended spilling part table price expecting fish put another way fish swimming liquid flavor good give star could much better buy find appel brunswick [liquid, fish, flavor, opened, expecting, large, amount, liquid, ended, spilling, part, table, price, expecting, fish, put, another, way, fish, swimming, liquid, flavor, good, give, star, could, much, better, buy, find, appel, brunswick]
27414 tried one big bowl hot spicy almost identical taste mainly use main base soup okay first two time eat however burning taste get old quick known base used one would bought big bowl soup instead smaller bowl sized noodle [tried, one, big, bowl, hot, spicy, almost, identical, taste, mainly, use, main, base, soup, okay, first, two, time, eat, however, burning, taste, get, old, quick, known, base, used, one, would, bought, big, bowl, soup, instead, smaller, bowl, sized, noodle]
36538 love mcdougall food product one quite measure aftertaste enjoy find single container try first recommend [love, mcdougall, food, product, one, quite, measure, aftertaste, enjoy, find, single, container, try, first, recommend]
45800 get tea asian grocery around dollar tea good rip [get, tea, asian, grocery, around, dollar, tea, good, rip]
55169 one ate product one clean vomit product eaten offered new chew dog first cared le record love chew drug choice thought maybe reluctant try something new left chew little later ate seemed enjoy afterward threw twice sure problem chew buying dog certainly recommending anyone else thought giving remaining chew spca dog decided make poochies sick sadly throw rest away [one, ate, product, one, clean, vomit, product, eaten, offered, new, chew, dog, first, cared, le, record, love, chew, drug, choice, thought, maybe, reluctant, try, something, new, left, chew, little, later, ate, seemed, enjoy, afterward, threw, twice, sure, problem, chew, buying, dog, certainly, recommending, anyone, else, thought, giving, remaining, chew, spca, dog, decided, make, poochies, sick, sadly, throw, rest, away]
64486 incredibly embarrassed basket thought sending something substance based seller description picture cost sister law suffered incredibly life threatening illness received basket town family thinking something use offer guest local relative snack tea visited cheese basket nothing expensive cracker school lunch size packet chocolate chip cooky embarrassed many company offer le expensive basket greater good order basket barb dv [incredibly, embarrassed, basket, thought, sending, something, substance, based, seller, description, picture, cost, sister, law, suffered, incredibly, life, threatening, illness, received, basket, town, family, thinking, something, use, offer, guest, local, relative, snack, tea, visited, cheese, basket, nothing, expensive, cracker, school, lunch, size, packet, chocolate, chip, cooky, embarrassed, many, company, offer, le, expensive, basket, greater, good, order, basket, barb, dv]
73778 accurate description product ordered received one box ten bar bar great since one box worth buying [accurate, description, product, ordered, received, one, box, ten, bar, bar, great, since, one, box, worth, buying]
82541 initial review one star pro amazon delivery ordered yesterday prime membership ontrac delivery placed doorstep today saturday thank amazon con either coffee bad make work tried approach senseo machine almost weight two senseo pod unfortunately longer sold amazon used two pod holder fit one pod holder coffee appeared brew e water ran machine fine result even close passable way tried result senseo user understand term right brew button two bar level far weak right brew button one bar level weak left brew button one bar level strong quite bitter left brew button two bar level weak bitter moral story order pod unless specifically designed machine since amazon accept return food product least let loss stand lesson others avoid senseo machine fit suspect coffee good message amazon sold coffee machine sell senseo brand pod anymore something specifically fit g pod update revision one day later okay since stuck thing figured give easily went back senseo pod managed make reasonably good star coffee lesson learned important note senseo user vi vi assume g pod make sure pod oriented correct side use right brew button one bar level yield tasty cup coffee certainly two cup yield know cost benefit v using gram pod reality bottom line revised star moral story give [initial, review, one, star, pro, amazon, delivery, ordered, yesterday, prime, membership, ontrac, delivery, placed, doorstep, today, saturday, thank, amazon, con, either, coffee, bad, make, work, tried, approach, senseo, machine, almost, weight, two, senseo, pod, unfortunately, longer, sold, amazon, used, two, pod, holder, fit, one, pod, holder, coffee, appeared, brew, water, ran, machine, fine, result, even, close, passable, way, tried, result, senseo, user, understand, term, right, brew, button, two, bar, level, far, weak, right, brew, button, one, bar, level, weak, left, brew, button, one, bar, level, strong, quite, bitter, left, brew, button, two, bar, level, weak, bitter, moral, story, order, pod, ...]

#Splitting into Train and Test Sets: Train data would be used to train the model and test data is the data on which the model would predict the classes and it will be compared with original labels to check the accuracy or other model test metrics.

NOTE: In this case I will split data into 70:30

df2
Time ProfileName Summary Text Score Satisfied text_clean tokenized_text stemmed_tokens
id
0 2012-02-11 08:00:00 cac Idaho Great for small kitchens. Keeps the cups off the counter!! It is a very ... 5 satisfied keep cup counter well made sturdy product litt... [keep, cup, counter, well, made, sturdy, produ... [keep, cup, counter, well, made, sturdi, produ...
1 2010-07-24 08:00:00 Fielden A. Coleman "Coleblooded1" Good Taste!! Good Price too!! The bar is pretty good. Taste more like cinnam... 4 satisfied bar pretty good taste like cinnamon apple pie ... [bar, pretty, good, taste, like, cinnamon, app... [bar, pretti, good, tast, like, cinnamon, appl...
2 2008-10-16 08:00:00 Auskan "Auskan" Easy & delicious I love having this in my pantry. I cook a bat... 5 satisfied love pantry cook batch rice add sauce dinner s... [love, pantry, cook, batch, rice, add, sauce, ... [love, pantri, cook, batch, rice, add, sauc, d...
3 2012-08-24 08:00:00 chicago This is the best! We used to have another brand tonkotsu flavor ... 5 satisfied used another brand tonkotsu flavor noodle impo... [used, another, brand, tonkotsu, flavor, noodl... [us, anoth, brand, tonkotsu, flavor, noodl, im...
4 2010-07-13 08:00:00 you suckkk Yum Herr's are my favorite chip brand. I am not su... 5 satisfied herr favorite chip brand fan salsa love chip [herr, favorite, chip, brand, fan, salsa, love... [herr, favorit, chip, brand, fan, salsa, love,...
... ... ... ... ... ... ... ... ... ...
91440 2012-09-05 08:00:00 JFMile Son wasn't a fan My son simply did not like this flavor jar. H... 3 not satisfied son simply like flavor jar like pea spinach ev... [son, simply, like, flavor, jar, like, pea, sp... [son, simpli, like, flavor, jar, like, pea, sp...
91449 2011-05-03 08:00:00 J. S. Bowen THEY HAVE CHANGED THIS TEA I USED TO LOVE THIS TEA WHEN IT WAS CALLED "WH... 1 not satisfied used love tea called white tea made peony plai... [used, love, tea, called, white, tea, made, pe... [us, love, tea, call, white, tea, made, peoni,...
91451 2010-05-21 08:00:00 Jennifer Hines "Jen H" Great Cocoa, Priced Too High While I love the taste of Green Mountain Hot C... 2 not satisfied love taste green mountain hot cocoa price k cu... [love, taste, green, mountain, hot, cocoa, pri... [love, tast, green, mountain, hot, cocoa, pric...
91454 2011-10-04 08:00:00 B. McMahon Yuck I didn't like this brand of coconut water it h... 2 not satisfied like brand coconut water strange taste brand l... [like, brand, coconut, water, strange, taste, ... [like, brand, coconut, water, strang, tast, br...
91455 2012-10-10 08:00:00 Burnadette Cerda Not as good as I thought it would be. The seller was amazing and fast so I would ord... 1 not satisfied seller amazing fast would order sad say tea ex... [seller, amazing, fast, would, order, sad, say... [seller, amaz, fast, would, order, sad, sai, t...

40000 rows × 9 columns

Split train_test set

# a function to split data into traing set and testing set with summary 

from sklearn.model_selection import train_test_split
# Train Test Split Function
def split_train_test(df2, test_size=0.3, shuffle_state=True):
    X_train, X_test, Y_train, Y_test = train_test_split(df2[['stemmed_tokens']], 
                                                        df2['Satisfied'], 
                                                        shuffle=shuffle_state,
                                                        test_size=test_size, 
                                                        random_state=15)
    print("Value counts for Train sentiment")
    print(Y_train.value_counts())
    print('\n')
    print("Value counts for Test sentiments")
    print(Y_test.value_counts())
    print('\n')
    print(type(X_train))
    print(type(Y_train))
    print('\n')
    X_train = X_train.reset_index()
    X_test = X_test.reset_index()
    Y_train = Y_train.to_frame()
    Y_train = Y_train.reset_index()
    Y_test = Y_test.to_frame()
    Y_test = Y_test.reset_index()
    print(X_train.head())
    return X_train, X_test, Y_train, Y_test
X_train, X_test, Y_train, Y_test = split_train_test(df2)
Value counts for Train sentiment
satisfied        14027
not satisfied    13973
Name: Satisfied, dtype: int64


Value counts for Test sentiments
not satisfied    6027
satisfied        5973
Name: Satisfied, dtype: int64


<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>


      id                                     stemmed_tokens
0  50319  [pleas, try, avoid, disappoint, wai, pack, qua...
1  58934  [deliveri, quick, easi, howev, product, best, ...
2  30562  [order, brother, law, like, coffe, ship, slow,...
3   6542  [on, best, chocol, bar, tast, recommend, frien...
4  55060  [want, like, coffe, like, bui, came, huge, bag...

x_train: The training part of the first sequence (x) x_test: The test part of the first sequence (x) y_train: The training part of the second sequence (y) y_test: The test part of the second sequence (y)

More Detail: splitting training ans testing set https://realpython.com/train-test-split-python-data/#:~:text=x_train%20%3A%20The%20training%20part%20of,of%20the%20second%20sequence%20(%20y%20)

X_train
id stemmed_tokens
0 50319 [pleas, try, avoid, disappoint, wai, pack, qua...
1 58934 [deliveri, quick, easi, howev, product, best, ...
2 30562 [order, brother, law, like, coffe, ship, slow,...
3 6542 [on, best, chocol, bar, tast, recommend, frien...
4 55060 [want, like, coffe, like, bui, came, huge, bag...
... ... ...
27995 71548 [found, dry, stale, also, rel, high, calori, p...
27996 88314 [greet, blew, first, request, review, howev, r...
27997 3467 [fantast, realli, le, calori, fat, eat, spoon,...
27998 10307 [number, cat, medic, problem, tri, number, wai...
27999 9732 [us, head, shoulder, moder, dandruff, work, we...

28000 rows × 2 columns

X_test
id stemmed_tokens
0 80957 [though, show, differ, flavor, bag, realli, at...
1 90660 [follow, review, larger, size, product, offer,...
2 79126 [watch, advertis, tofu, noodl, decid, try, sup...
3 17701 [kid, seriou, allergi, tri, bake, muffin, us, ...
4 8261 [on, pack, pack, price, care, mistak, two, pac...
... ... ...
11995 2788 [great, choic, like, cinnamon, roll, flavor, c...
11996 13761 [get, cake, mix, auto, deliveri, long, rememb,...
11997 20579 [barri, farm, establish, oct, bill, linda, bar...
11998 19695 [love, herbal, tea, delici, tast, like, oolong...
11999 17727 [dog, alwai, consum, love, pedigre, ag, need, ...

12000 rows × 2 columns

Y_train
id Satisfied
0 50319 not satisfied
1 58934 not satisfied
2 30562 not satisfied
3 6542 satisfied
4 55060 not satisfied
... ... ...
27995 71548 not satisfied
27996 88314 not satisfied
27997 3467 satisfied
27998 10307 satisfied
27999 9732 satisfied

28000 rows × 2 columns

Y_test
id Satisfied
0 80957 not satisfied
1 90660 not satisfied
2 79126 not satisfied
3 17701 satisfied
4 8261 not satisfied
... ... ...
11995 2788 satisfied
11996 13761 satisfied
11997 20579 not satisfied
11998 19695 satisfied
11999 17727 satisfied

12000 rows × 2 columns

Word2Vec

Feature Extraction we will use Word2Vec Model, which is a pre-trained model to fitting

from gensim.models import Word2Vec import time # Skip-gram model (sg = 1) vector_size=1000 window = 5 min_count = 1 workers = 3 sg = 1

word2vec_model_file = ‘word2vec_’ + str(vector_size) + ‘.model’ start_time = time.time() stemmed_tokens = pd.Series(df2[‘stemmed_tokens’]).values # Train the Word2Vec Model w2v_model = Word2Vec(stemmed_tokens, min_count = min_count,vector_size=vector_size ,workers = workers, window = window, sg = sg) print(“Time taken to train word2vec model:” + str(time.time() - start_time))

Because this process might take a long time as well, so i save the file ‘word2vec_model_file’

w2v_model.save(word2vec_model_file)

# after fitting the model each word in our review can be perceived as vectors
#and now, it being able to find some kind of correlation between those words

# Load the model from the model file
w2v_model = Word2Vec.load(word2vec_model_file)

# Most Similar word
print(w2v_model.wv.most_similar('well'))

#Now the model know that some words  which have the similar meaning would be represented by corresponding value 
w2v_model.wv.similarity('good', 'worthwhil')
[('lastli', 0.6035114526748657), ('strictli', 0.5986160039901733), ('perfectli', 0.5976307988166809), ('bravo', 0.5926766395568848), ('newest', 0.5891979336738586), ('swap', 0.5885021686553955), ('definitli', 0.5863612294197083), ('gosh', 0.5830129384994507), ('creativ', 0.5791662335395813), ('lacklust', 0.5786160230636597)]
0.69994974
# The model can classify the word that not belong to a group
w2v_model.wv.doesnt_match(['good', 'charm', 'amazingli','bad','well']) 
'bad'
w2v_model.wv.similarity('good', 'worthwhil')
0.69994974
w2v_model.wv.similarity('bad', 'bitter')
0.36606473
w2v_model.wv.similarity('bad', 'good') # need to fix this
0.5548429
w2v_model.wv.most_similar(positive="bad")
[('keen', 0.6243119239807129),
 ('terribl', 0.6219682693481445),
 ('becuas', 0.6180531978607178),
 ('echo', 0.6136906743049622),
 ('yucki', 0.6117876172065735),
 ('swear', 0.607035219669342),
 ('medicinei', 0.6067305207252502),
 ('wierd', 0.6061832308769226),
 ('unpalat', 0.6051055788993835),
 ('interestingli', 0.604747474193573)]
w2v_model.wv.most_similar(positive="chip")
[('popchip', 0.6958118081092834),
 ('kettl', 0.6679425239562988),
 ('ahoi', 0.6633185744285583),
 ('tortilla', 0.6560363173484802),
 ('frito', 0.648652195930481),
 ('terra', 0.6212309002876282),
 ('lai', 0.6164361834526062),
 ('mesquit', 0.6151025295257568),
 ('pringl', 0.6080288887023926),
 ('pretzel', 0.6054346561431885)]

Core Process of Word2Vec

From now, we need will work with traing set to fitting the model before making prediction. we loop through X_train and X_test, which previously splitted beforehand and we kind of find the mean of each vector in a reviewand used that as a representative of tone in that review

#for training set
# we find the mean of vector in each review and used that as a representative of tone in that review
#write them into csv file.
word2vec_filename = 'train_review_word2vec.csv'
with open(word2vec_filename, 'w+') as word2vec_file:
    for index, row in X_train.iterrows():
        model_vector = (np.mean([w2v_model.wv[token] for token in row['stemmed_tokens']], axis=0)).tolist()
        if index == 0:
            header = ",".join(str(ele) for ele in range(1000))
            word2vec_file.write(header)
            word2vec_file.write("\n")
        # Check if the line exists else it is vector of zeros
        if type(model_vector) is list:  
            line1 = ",".join( [str(vector_element) for vector_element in model_vector] )
        else:
            line1 = ",".join([str(0) for i in range(1000)])
        word2vec_file.write(line1)
        word2vec_file.write('\n')
C:\Python\lib\site-packages\numpy\core\fromnumeric.py:3474: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
C:\Python\lib\site-packages\numpy\core\_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
#for teseting set
#find the mean
#Also,write them into csv file.
word2vec_filename = 'test_review_word2vec.csv'
with open(word2vec_filename, 'w+') as word2vec_file:
    for index, row in X_test.iterrows(): #itterows(); used to loop over each review annd find the mean to represent each review
        model_vector = (np.mean([w2v_model.wv[token] for token in row['stemmed_tokens']], axis=0)).tolist()
        if index == 0:
            header = ",".join(str(ele) for ele in range(1000))
            word2vec_file.write(header)
            word2vec_file.write("\n")
        # Check if the line exists else it is vector of zeros
        if type(model_vector) is list:  
            line1 = ",".join( [str(vector_element) for vector_element in model_vector] )
        else:
            line1 = ",".join([str(0) for i in range(1000)])
        word2vec_file.write(line1)
        word2vec_file.write('\n')
#now we need algorithm for prediction, in this case,  I use RandomForestClassifier
import time
#import RandomForestClassifier, this is the algorithm that will be used for classification
from sklearn.ensemble import RandomForestClassifier

# Load from the filename
trainvec = pd.read_csv('train_review_word2vec.csv') # training
testvec = pd.read_csv('test_review_word2vec.csv') # testing

#Initialize the model
forest_word2vec = RandomForestClassifier(n_estimators = 100)

start_time = time.time()
# Fit the model
forest_word2vec.fit(trainvec, Y_train['Satisfied']) # fitting the model; find the coefficients or the model
print("Time taken to fit the model with word2vec vectors: " + str(time.time() - start_time))
Time taken to fit the model with word2vec vectors: 79.27750396728516
#use model that being fitted already to predict the result
# the result is either the review in testset is 'satisfied', or 'not satisfied
result = forest_word2vec.predict(testvec) 
result.shape
(12000,)
result[::10]
array(['not satisfied', 'satisfied', 'satisfied', ..., 'not satisfied',
       'satisfied', 'satisfied'], dtype=object)
#append the result to our test set
Y_test['Predict'] = result
Y_test['review'] = X_test['stemmed_tokens']
# the end result
Y_test[::500]
id Satisfied Predict review
0 80957 not satisfied not satisfied [though, show, differ, flavor, bag, realli, at...
500 80996 not satisfied not satisfied [whenth, packag, arriv, two, can, open, empti,...
1000 11473 satisfied not satisfied [crunchi, cooki, creami, center, cooki, part, ...
1500 43875 not satisfied not satisfied [complet, underwhelm, overpr, would, consid, t...
2000 20155 satisfied satisfied [year, old, finicki, cat, absolut, love, food,...
2500 15199 satisfied not satisfied [oh, boi, dubbl, bubbl, bubbl, gum, giant, ind...
3000 14302 satisfied not satisfied [good, appl, soft, hard, flavor, good, packag,...
3500 7918 satisfied satisfied [cook, grain, soft, creami, tast, somewhat, ri...
4000 91399 not satisfied satisfied [four, rescu, cat, give, iam, said, organ, tho...
4500 36933 not satisfied not satisfied [mind, pai, bag, dog, love, expens, look, bag,...
5000 2063 satisfied not satisfied [struggl, dry, skin, excit, try, product, dove...
5500 24499 satisfied satisfied [love, almond, flour, graini, us, bake, fine, ...
6000 20320 not satisfied satisfied [want, try, someth, healthi, differ, meusli, e...
6500 19246 satisfied satisfied [italian, greyhound, love, treat, yet, notic, ...
7000 14868 satisfied satisfied [wife, absolut, ador, cocoa, easi, make, hot, ...
7500 18741 satisfied satisfied [alwai, love, walker, shortbread, shape, size,...
8000 49797 not satisfied not satisfied [month, old, practic, live, cheerio, plum, org...
8500 67583 not satisfied not satisfied [enjoi, dip, dress, compani, dip, strong, bitt...
9000 37978 not satisfied not satisfied [us, yogi, tea, lemon, ginger, tea, chang, wan...
9500 22370 satisfied satisfied [done, recur, shipment, product, steal, carb, ...
10000 241 satisfied satisfied [tulli, hous, blend, becom, coffe, alwai, hand...
10500 54662 not satisfied not satisfied [like, type, coffe, surpris, blend, suit, tast...
11000 73658 not satisfied not satisfied [purchas, option, cat, sinc, eat, chicken, muc...
11500 352 satisfied satisfied [love, coffe, wonder, flavor, take, litll, mak...

Evaluate the model

#using a funciton to evaluate the model
from sklearn.metrics import classification_report, confusion_matrix
print(classification_report(Y_test['Satisfied'],result, zero_division=0))
               precision    recall  f1-score   support

not satisfied       0.80      0.79      0.79      6027
    satisfied       0.79      0.80      0.79      5973

     accuracy                           0.79     12000
    macro avg       0.79      0.79      0.79     12000
 weighted avg       0.79      0.79      0.79     12000

Confusion Matrix

#visualizing the evaluation of model with heatmap
# we will use confusion matrix and feed that into heatmap
cf_matrix = confusion_matrix(Y_test['Satisfied'], result)
cf_matrix
array([[4758, 1269],
       [1198, 4775]], dtype=int64)
#visualizing the evaluation of model with heatmap
import seaborn as sns

ax = sns.heatmap(cf_matrix, annot=True, cmap='YlGn', fmt='.1f')

ax.set_title('Seaborn Confusion Matrix with labels\n\n');
ax.set_xlabel('\nPredicted Values')
ax.set_ylabel('Actual Values ');

## Ticket labels - List must be in alphabetical order
ax.xaxis.set_ticklabels(['False','True'])
ax.yaxis.set_ticklabels(['False','True'])

## Display the visualization of the Confusion Matrix.
plt.show()

x_train: The training part of the first sequence (x) x_test: The test part of the first sequence (x) y_train: The training part of the second sequence (y) y_test: The test part of the second sequence (y)

Special Thanks to: https://medium.com/swlh/sentiment-classification-using-word-embeddings-word2vec-aedf28fbb8ca and all related wonderful post in Stack Overflow